Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
General code clean-up of xc_linux.c.
Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
Change the naming scheme of hap_gva_to_gfn to match that of guest_walk_tables
(i.e. hap_gva_to_gfn_n_levels instead of hap_gva_to_gfn_nlevel)
Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
Fix a reference to X86EMUL_OKAY which was hardcoded as a 0 instead.
Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
Keir Fraser [Thu, 17 Dec 2009 06:22:02 +0000 (06:22 +0000)]
hvm: handle PVRDTSCP mode
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Wed, 16 Dec 2009 22:26:38 +0000 (22:26 +0000)]
hvm: Clean up RDTSCP/TSC_AUX handling.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 16 Dec 2009 22:26:15 +0000 (22:26 +0000)]
Turn tmem (transcendent memory) support on by default.
Tmem has been in-tree for about seven months, but disabled
by default. Enabling it should be entirely harmless
unless a running PV domain has been tmem-modified.
I'd like to confirm that by enabling it now, so that
it can be enabled by default for the 4.0.0 release.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Wed, 16 Dec 2009 16:48:17 +0000 (16:48 +0000)]
AMD IOMMU: Fix a xen crash on amd iommu systems
Changeset 20514 implemented deallocation for msi interrupt remapping
entries. This patch adds the same support for amd iommu to fix a xen
crash on amd iommu systems.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Wed, 16 Dec 2009 16:47:31 +0000 (16:47 +0000)]
AMD IOMMU: Reset event logging when event overflows
Restart iommu event logging if EventOverFlow bit is set to prevent
event logging from being disabled after event overflows.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Wed, 16 Dec 2009 16:42:44 +0000 (16:42 +0000)]
pygrub: add ext4 support
This is a port of the following two patches:
http://patches.ubuntulinux.org/g/grub/extracted/ext4_support.diff
http://patches.ubuntulinux.org/g/grub/extracted/ext4_fix_variable_sized_inodes.diff
Signed-off-by: Mark Johnson <mark.johnson@sun.com>
Keir Fraser [Wed, 16 Dec 2009 12:45:18 +0000 (12:45 +0000)]
x86_emulate: Emulate RDTSCP instruction.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 16 Dec 2009 12:32:35 +0000 (12:32 +0000)]
iommu: Actually clear IO-APIC pins on boot and shutdown when used with an IOMMU
When booted with iommu=on, io_apic_read/write functions call into the
interrupt remapping code to update the IRTEs. Unfortunately, on boot
and shutdown, we really want clear_IO_APIC() to sanitize the actual
IOAPIC RTE, and not just the bits that are active when interrupt
remapping is enabled. This is particularly a problem on older
versions of Xen which used the IOAPIC RTE as the canonical source for
the IRTE index. In that case, clear_IO_APIC() actually causes
whatever happens to be stored in the RTEs to be used as an IRTE index,
which can come back and bite us in ioapic_guest_write() if we attempt
to remove an interrupt that didn't actually exist. Current upstream
appears less susceptible to errors since the IRTE index is stored in
an array, but it's still a good idea to sanitize the IOAPIC state.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 16 Dec 2009 12:23:21 +0000 (12:23 +0000)]
HVM RDTSCP fixes
- Put the guest rdtscp cpuid logic in xc_cpuid_x86.c.
- MSR_TSC_AUX's high 32bit is reserved, so only write the low 32bit.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Keir Fraser [Wed, 16 Dec 2009 12:21:43 +0000 (12:21 +0000)]
XSM: Restore policy backwards compatibility
This restores backwards compatibility with older XSM policy. Policies
built with older versions of checkpolicy will once again work in Xen.
Signed-off-by : Paul Nuzzi <pjnuzzi@tycho.ncsc.mil>
Keir Fraser [Wed, 16 Dec 2009 12:20:57 +0000 (12:20 +0000)]
pygrub: fix attribute error when not found parser
Signed-off-by: Wei Kong <weikong.cn@gmail.com>
Keir Fraser [Wed, 16 Dec 2009 12:20:08 +0000 (12:20 +0000)]
xenoprof: Fix support for active domains
If a user tries to use opcontrol with option --active-domains in dom0
and then run opcontrol in a guest, no samples are generated. When the
guest calls the xenoprof interface it resets the internal Xenoprof
state machine and profiling does not start
Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com>
Keir Fraser [Mon, 14 Dec 2009 11:58:45 +0000 (11:58 +0000)]
xen-detect: Avoid dumping core
F12 introduces a tool to automatically report bugs when there are core
dumps. Since xen-detect relies on fork+waitpid in order to trap a
SIGILL from a child, every time someone runs xen-detect on a bare
metal kernel a bug is reported into Red Hat's Bugzilla. :-)
However, even without this contingent need, leaving core dumps around
is not nice. So this patch just traps SIGILL using
signal/sentjmp/longjmp, without the need to fork.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 14 Dec 2009 11:38:15 +0000 (11:38 +0000)]
mini-os: Fix a compilation error in xencons_ring when !HAVE_LIBC
Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Keir Fraser [Mon, 14 Dec 2009 09:51:07 +0000 (09:51 +0000)]
mini-os: Fix memory leaks in blkfront, netfront, pcifront, etc.
The return value of Xenbus routines xenbus_transaction_start(),
xenbus_printf(), xenbus_transaction_end(), etc. is a pointer of error
message. This pointer should be passed to free() to release the
allocated memory when it is no longer needed.
Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Keir Fraser [Mon, 14 Dec 2009 09:48:47 +0000 (09:48 +0000)]
x86_32: Fix build after RDTSCP and memory hotplug changes.
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Keir Fraser [Mon, 14 Dec 2009 09:36:26 +0000 (09:36 +0000)]
Fix bug in c/s 20332 "Add commands to hotplug usb devices to hvm guests"
Signed-off-by: James Song Wei <jsong@novell.com>
Keir Fraser [Mon, 14 Dec 2009 09:31:00 +0000 (09:31 +0000)]
HVM vcpu add/remove: parse vcpu_avail to Qemu
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Disable qemu cmdline option until our qemu supports it.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 14 Dec 2009 09:25:47 +0000 (09:25 +0000)]
HVM vcpu add/remove: parse 'vcpu_avail' to firmware and set up madt
accordingly
-- currently firmware has got 'vcpus' from xend, this patch add parse
'vcpu_avail' to firmware;
-- setup madt 'lapic' subitems of processors accoring to vcpus and
vcpu_avail which finally come from config;
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 14 Dec 2009 09:14:26 +0000 (09:14 +0000)]
HVM vcpu add/remove: setup dsdt infrastructure by mk_dsdt.c for vcpu add/remove
In order to support HVM vcpu add/remove, we need set dsdt
infrastructure.
-- By using mk_dsdt.c, it auto-produce related asl code when
compiling.
-- It define processor related objects and control methods (_MAT,
_EJ0, _STA, etc).
-- It also define GPE _L02 and Notify control method for SCI
interrupt, which will trigger HVM acpi driver to add/remove cpu.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 14 Dec 2009 08:00:26 +0000 (08:00 +0000)]
PoD: correct assertion and remove noisy messages
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Keir Fraser [Mon, 14 Dec 2009 07:59:40 +0000 (07:59 +0000)]
docs: add a document about guest cpuid configuration
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 14 Dec 2009 07:59:12 +0000 (07:59 +0000)]
xend: fix empty 'cpus' parsing
/etc/xen/xmexample.hvm says "" means "leave to Xen to pick", but we
get a "Error: string index out of range" currently.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 14 Dec 2009 07:58:47 +0000 (07:58 +0000)]
xend: fix a typo introduced by changeset 20621:
f9392f6eda79
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 14 Dec 2009 07:58:15 +0000 (07:58 +0000)]
Fix bug in c/s 20332 "Add commands to hotplug usb devices to hvm guests"
Signed-off-by: James Song Wei <jsong@novell.com>
Keir Fraser [Mon, 14 Dec 2009 07:57:23 +0000 (07:57 +0000)]
Disable watchdog in dump_registers
Avoids triggering watchdog if serial port output is slow.
Signed-off-by: Andrew Lyon <andrew.lyon@gmail.com>
Keir Fraser [Mon, 14 Dec 2009 07:56:21 +0000 (07:56 +0000)]
Fix losetup -f not working on SLES10
Signed-off-by: Gary Grebus <gary.grebus@oracle.com>
Keir Fraser [Mon, 14 Dec 2009 07:55:35 +0000 (07:55 +0000)]
Fix clock for XCP Windows PV drivers on restore
This fixes a timekeeping issue for 32 bit guests running XCP Windows
paravirtual drivers on a 64 bit hypervisor where their clock was set
to the 1970s after live migration or restore. Thanks to Paul Durrant
for helping track this down.
>From the original XCP patch:
Arrange that the wallclock time fields in the shared_info structure
are set correctly in 32 bit HVM guests on a 64 bit hypervisor. HVM
guests on a 64 bit hypervisor always start with a 64 bit shared info,
and then change to a 32 bit one if they're using 32 bit drivers. The
32-bit and 64-bit shared info structures put their wallclock times in
slightly different places, and so the wallclock time needs to be
regenerated when you do the conversion.
It can be argued that we should convert the other fields of shared
info at the same time (e.g. if an event channel is pending beforehand,
it should be pending afterwards), but that's much harder to arrange,
because the 32 bit structure can't represent all the states which the
64 bit one can. Just setting the time seems to be sufficient for
our purposes.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Signed-off-by: Keith Coleman <keith@scaltro.com>
Keir Fraser [Mon, 14 Dec 2009 07:54:53 +0000 (07:54 +0000)]
cpuidle: fix the menu governor to enhance IO performance
this is a revised version of linux upstream commit
69d25870f20c4b2563304f2b79c5300dd60a067e:
"
cpuidle: fix the menu governor to boost IO performance
Fix the menu idle governor which balances power savings, energy
efficiency
and performance impact.
The reason for a reworked governor is that there have been
serious
performance issues reported with the existing code on Nehalem
server
systems.
To show this I'm sure Andrew wants to see benchmark results:
(benchmark is "fio", "no cstates" is using "idle=3Dpoll")
no cstates current linux new algorithm
1 disk 107 Mb/s 85 Mb/s 105 Mb/s
2 disks 215 Mb/s 123 Mb/s 209 Mb/s
12 disks 590 Mb/s 320 Mb/s 585 Mb/s
In various power benchmark measurements, no degredation was found
by our
measurement&diagnostics team. Obviously a small percentage more
power was
used in the "fio" benchmark, due to the much higher performance.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"
in Xen version, most logic is similar and with only one exception:
linux use nr_iowait and loadavg to track the pending I/O request,
which however is not visible to Xen. so Xen use the do_irq frequency
to estimate the I/O pressure. this is not as accurate as linux, and
the better approach is to convey guest latency requirement to
hypervisor by virtual C state. this can be the future enhancement.
the detail algorithm description is in code comment. with this new
algorithm, fio benchmark performance improve ~5% with 1 disk. and no
power degration is found in idle case.
Signed-off-by: Yu Ke <ke.yu@intel.com>
Keir Fraser [Mon, 14 Dec 2009 07:52:22 +0000 (07:52 +0000)]
hvm: Fix CR0.WP=0 emulation. Don't take write emulation path for MMIO.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Mon, 14 Dec 2009 07:46:57 +0000 (07:46 +0000)]
Add RDTSCP instruction support for HVM VMX guest.
RDTSCP is introduced in Nehalem processor on Intel platform. Like
RDTSC, RDTSCP will return the TSC value, besides, it will return the
low 32bit of TSC_AUX MSR. Currently Linux kernel will write (node_id
<< 12 | process_id) into that MSR, so that when guest execs RDTSCP, it
will also get processor information. - This instruction is supported
for HVM only when the hardware has this capability (indicated by
cpuid).
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Keir Fraser [Mon, 14 Dec 2009 07:45:04 +0000 (07:45 +0000)]
Pvrdtscp: move write_rdtscp_aux() to paravirt_ctxt_switch_to() -
Currently write_rdtscp_aux() is placed in update_vcpu_system_time(),
which is called by schedule() before context_switch(). This will break
the HVM guest TSC_AUX state because at this point, MSR hasn't beed
saved for HVM guests.So put the function in the point when a PV vcpu
is really scheduled in.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Keir Fraser [Fri, 11 Dec 2009 09:17:09 +0000 (09:17 +0000)]
docs: Fixes for README
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 11 Dec 2009 09:07:57 +0000 (09:07 +0000)]
Update Xen version to 4.0.0-rc1-pre
Keir Fraser [Fri, 11 Dec 2009 09:01:15 +0000 (09:01 +0000)]
mini-os: Fix memory leaks in xs_read() and xs_write()
xenbus_read() and xenbus_write() will allocate memory for error
message if any error occurs, this memory should be freed.
Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 11 Dec 2009 09:00:40 +0000 (09:00 +0000)]
libxenlight: Disable unneeded C++ binding for libconfig
If we want to avoid that a C++ compiler becomes a requirement for a
Xen build, we should disable the (unneeded) C++ library generation for
the embedded libconfig.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Fri, 11 Dec 2009 08:59:54 +0000 (08:59 +0000)]
tools: improve NUMA guest placement when ballooning
the "guest to a single NUMA node" constrain algorithm does not work
well when we do ballooning. Ballooning and NUMA don't play together
anyway, as Dom0 and thus ballooning is not NUMA aware, I am working on
this but it will not be ready for the Xen 4.0 release window. The
usual ballooning situation will result in an empty candidate list, as
no node has enough free memory to host the guest. In this case the
code will simply pick the first node: again and again, because all
nodes without enough memory will be ultimately penalized with the same
maxint value (regardless of the actual load). The attached patch will
change this to use a relative penalty in case of not-enough memory, so
that low-load low-memory nodes will be used at one point. A half
loaded node has shown to be a good value, as an unbalanced system is
much worse than non-local memory access for guests. Regardless of
that you should restrict the Dom0 on a NUMA system to a reasonable
memory size, so that ballooning is not necessary most of the time. In
this case the guest's memory will be NUMA local.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Fri, 11 Dec 2009 08:58:06 +0000 (08:58 +0000)]
memory hotadd 7/7: hypercall support
The basic work flow to handle the memory hotadd is:
Update node information
Map new pages to xen 1:1 mapping
Setup frametable for new memory range
Setup m2p table for new memory range
Put the new pages to domheap
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:57:30 +0000 (08:57 +0000)]
memory hotadd 6/7: Allocate L3 table for whole direct maping range if
memory hotplug is supported.
Hot-added memory may need a new L4 entry for 1:1 mapping. This patch
setup all L4 entry for 1:1 mapping if memory hotadd is needed, so that
we don't need sync the guest page table in page fault handler.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:56:50 +0000 (08:56 +0000)]
memory hotadd 5/7: Sync changes to mapping changes caused by memory
hotplug in page fault handler.
In compact guest situation, the compat m2p table is copied, not
directly mapped in L3, so we have to sync it. Direct mapping range
may changes, and we need sync it with guest's table.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:56:04 +0000 (08:56 +0000)]
memory hotadd 4/7: Setup frametable for hot-added memory
We can't use alloc_boot_pages for memory hot-add, so change it to use
the pages range passed in.
One changes need notice is, when memory hotplug needed, we have to
setup initial frametable as pdx index (i.e. the pdx_gorund_valid)
aligned, to make sure mfn_valid() still works after the max_page is
not maximum anymore.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:55:08 +0000 (08:55 +0000)]
memory hotadd 3/7: Function to share m2p tables with guest.
The m2p tables should be shared by guest as they will be read-only
mapped by guest. This logical is similar to what happens in
subarch_init_memory(). But we need check the mapping is just setup.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:54:37 +0000 (08:54 +0000)]
memory hotadd 2/7: Destroy m2p table for hot-added memory when hot-add failed.
As when we destroy the m2p table, it should not be used, so we don't
need consider clean the head/tail mapping that may exits before hot-add.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:53:57 +0000 (08:53 +0000)]
memory hotadd 1/7: Setup m2p table for hot-added memory
When new memory added to the system, we need to update the m2p table
to cover the new memory range.
When memory add, it is difficult to allocate continous pages, so we
allocate the memory from the new added memory range. This also improve
the locality in numa situation.
We don't support 1G mapping for hot memory, because AFAIK currently
hot-plug memory will not be that large.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:52:17 +0000 (08:52 +0000)]
PVUSB: xm/xend support
You can see the following slides to understand the usage.
http://www.xen.org/files/xensummit_intel09/PVUSBStatusUpdate.pdf
Limitations:
"xm usb-hc-create" accepts up to 16 ports, but, current usbfront
can work with up to 15 ports. This may be bug and I'm preparing
to fix it.
This xm/xend support requires linux-2.6.18-xen.hg c/s 939 or above.
I recommend latest tip.
Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
Keir Fraser [Fri, 11 Dec 2009 08:51:21 +0000 (08:51 +0000)]
docs: Example usage of pvrdtscp algorithm
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Fri, 11 Dec 2009 08:50:13 +0000 (08:50 +0000)]
x86: Allow HPET to set timers more sloppily by seeing each CPU's
acceptable deadline range, rather than just deadline start.
Signed-off-by: Wei Gang <gang.wei@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 11 Dec 2009 08:47:51 +0000 (08:47 +0000)]
libxenlight: fix cd-insert cli arguments parsing
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 11 Dec 2009 08:46:02 +0000 (08:46 +0000)]
libxenlight: add a cli option to exit right after domain creation
This patch adds a command line option in xl to exit right after domain
creation and not wait in background for the death of the domain.
Users should be aware that if they use this option, they always have
to destroy the domain manually after the guest shuts down.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 11 Dec 2009 08:45:26 +0000 (08:45 +0000)]
libxenlight: fix two memory related issues
- LIBXL_MAXMEM_CONSTANT is 1MB but must be expressed in KB;
- xc_dom_linux_build should take target_memkb instead of max_memkb as
an argument.
Thanks to Andres for spotting the latter.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 11 Dec 2009 08:44:33 +0000 (08:44 +0000)]
domain builder: multiboot-like module support
This defines how multiple modules can be passed to a domain by packing
them together into a "multiboot module" in a way very similar to the
multiboot standard. An SIF_ flag is added to announce such package.
This also adds a packing implementation to PV-GRUB.
Signed-Off-By: Samuel Thibault <samuel.thibault@ens-lyon.org>
Keir Fraser [Fri, 11 Dec 2009 08:42:28 +0000 (08:42 +0000)]
PoD: appropriate BUG_ON when domain is dying
BUG_ON(d->is_dying) in p2m_pod_cache_add() which is introduced in
c/s 20426 is not proper. Since dom->is_dying is set asynchronously.
For example, MMU_UPDATE hypercalls from qemu and the
DOMCTL_destroydomain hypercall from xend can be issued simultaneously.
Also this patch lets p2m_pod_empty_cache() wait by spin_barrier
until another PoD operation ceases.
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Wed, 9 Dec 2009 10:59:31 +0000 (10:59 +0000)]
x86-32/pod: fix map_domain_page() leak
The 'continue' in the if() part of the conditional at the end of
p2m_pod_zero_check() was causing this, but there also really is no
point in retaining the mapping after having checked page contents,
so fix it both ways. Additionally there is no point in updating
map[] at this point anymore.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 9 Dec 2009 10:58:52 +0000 (10:58 +0000)]
tools: simplify PYTHON_PATH computation (and fixes for NetBSD)
Doesn't work when build-time python path differs from install-time. Do
we care about this given tools should be packaged/built for the
specific run-time distro?
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Wed, 9 Dec 2009 10:46:11 +0000 (10:46 +0000)]
tmem, xentop: Report a few key per-domain tmem statistics in xentop.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Wed, 9 Dec 2009 10:44:56 +0000 (10:44 +0000)]
tmem: reclaim minimal memory proactively
When a single domain is using most/all of tmem memory
for ephemeral pages belonging to the same object, e.g.
when copying a single huge file larger than ephemeral
memory, long lists are traversed looking for a page to
evict that doesn't belong to this object (as pages in
the object for which a page is currently being inserted
are locked and cannot be evicted). This is essentially
a livelock.
Avoid this by proactively ensuring there is a margin
of available memory (1MB) before locks are taken on
the object.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Wed, 9 Dec 2009 10:44:11 +0000 (10:44 +0000)]
libxenlight: implement libxl_set_memory_target
This patch adds a target_memkb parameter to libxl_domain_build_info to
set the target memory for the VM at build time and a new function
called libxl_set_memory_target to dynamically modify the memory target
of a VM at run time. Finally a new command "mem-set" is added to xl
that calls directly libxl_set_memory_target.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 9 Dec 2009 10:43:33 +0000 (10:43 +0000)]
libxenlight: xenstore data path writable by the guest
Make the data path on xenstore writable by the guest
because Citrix pv drivers requires it.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 9 Dec 2009 10:42:53 +0000 (10:42 +0000)]
SRAT memory hotplug 2/2: Support overlapped and sparse node memory arrangement.
Currently xen hypervisor use nodes to keep start/end address of
node. It assume memory among nodes has no overlap, this is not always
true, especially if we have memory hotplug support in the system.
This patch backport Linux kernel's memblks to support overlapping
among node. The memblks will be used both for checking conflict, and
caculate memnode_shift.
Also, currently if there is no memory populated in a node when system
booting, the node will be unparsed later, and the corresponding CPU's
numa information will be removed also. This patch will keep the CPU
information.
One thing need notice is, currently we caculate memnode_shift with all
memory, including un-populated ones. This should work if the smallest
chuck is not so small. Other option can be flags in the page_info
structure, etc.
The memnodemap is changed from paddr to pdx, both to save space, and
also because currently most access is from pfn.
A flag is mem_hotplug added if there is hotplug memory range.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Wed, 9 Dec 2009 10:41:37 +0000 (10:41 +0000)]
SRAT memory hotplug 1/2: Revert 20053:
ebb07c5934c8.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Tue, 8 Dec 2009 14:14:27 +0000 (14:14 +0000)]
hvm: Share ASID logic between VMX and SVM.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 8 Dec 2009 10:33:08 +0000 (10:33 +0000)]
hvm: Pull SVM ASID management into common HVM code where it can be shared.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 8 Dec 2009 07:55:21 +0000 (07:55 +0000)]
Track free pages live rather than count pages in all nodes/zones
Trying to fix a livelock condition in tmem that occurs
only when the system is totally out of memory requires
the ability to easily determine if all zones in all
nodes are empty, and this must be checked at a fairly
high frequency. So to avoid walking all the zones in
all the nodes each time, I'd like a fast way to determine
if "free_pages" is zero. This patch tracks the sum
of the free pages in all nodes/zones. Since I think
the value is modified only when heap_lock is held,
it need not be atomic.
I don't know this for sure, but suspect this will be
useful in other future memory utilization code, e.g.
page sharing.
This has had limited testing, though I did drive free
memory down to zero and up and down a few times with
debug on and no asserts were triggered.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Tue, 8 Dec 2009 07:51:30 +0000 (07:51 +0000)]
VT-d: per-iommu domain-id
Currently, xen uses shared iommu domain-id across all the VT-d units
in the platform. The number of iommu domain-ids (NR_DID, e.g. 256)
supported by each VT-d unit is reported in Capability register. The
limitation of current implementation is it only can support at most
NR_DID domains with VT-d in the entire platform, even though the
platform can support N * NR_DID (where N is the number of VT-d
units). Imagine a platform with several SR_IOV NICs, and each NIC
supports 128 VFs. It possibly beyond the NR_DID.
This patch implements iommu domain-id management per iommu (VT-d
unit), hence solves above limitation. It removes the global domain-id
bitmap, instead use domain-id bitmap in struct iommu, and also involve
an array to map guest domain-id and iommu domain-id, which is used to
iommu domain-id when flush context cache or IOTLB. When a device is
assigned to a guest, choose an available iommu domain-id from the
device's iommu, and map guest domain id to the domain-id mapping
array. When a device is deassigned from a guest, clear the domain-id
bit in domain-id bitmap and clear the corresponding entry in domain-id
map array if there is no other devices under the same iommu owned by
the guest.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Tue, 8 Dec 2009 07:49:54 +0000 (07:49 +0000)]
xend: Add keymap to vfb config for existing hvm guests
I submitted a patch a while back to add keymap to vfb config for hvm
guests. This patch works fine for new config (xm create|new) but not
existing, managed guests. To cover the latter case I've introduced a
validator method in XendConfig.
Signed-off-by: Jim Fehlig <jfehlig@novell.com>
Keir Fraser [Tue, 8 Dec 2009 07:48:45 +0000 (07:48 +0000)]
Make tsc_mode=3 (pvrdtscp) work correctly.
Initial tsc_mode patch contained a rough cut at pvrdtscp mode. This
patch gets it working correctly. For the record, pvrdtscp mode allows
an application to obtain information from Xen to descale/de-offset
a physical tsc value to obtain "nsec since VM start". Though the
raw tsc value may change across migration due to different Hz rates
and different start times of different physical machines, applying
the pvrdtscp algorithm to a raw tsc value guarantees that the result
will always be both a fixed known rate (nanoseconds) and monotonically
increasing. BUT, pvrdtscp will only be fast on physical machines that
support the rdtscp instruction AND on which tsc is "safe"; on other
machines both the rdtsc and rdtscp instructions will be emulated.
Also note that when tsc_mode=3 is enabled, tsc-sensitive applications
that do NOT implement the pvrdtscp algorithm will behave incorrectly.
So, tsc_mode=3 should only be used when all apps are either
tsc-resilient
or pvrdtscp-modified, and only has a performance advantage on very
recent generation processors.
Signed-off-by: Dan Magenheiemer <dan.magenheimer@oracle.com>
Keir Fraser [Tue, 8 Dec 2009 07:47:52 +0000 (07:47 +0000)]
libxenlight: implement cdrom insert/eject
This patch implements functions in libxenlight to change the cdrom in
a VM at run time and to handle cdrom eject requests from guests.
This patch adds two new commands to xl: cd-insert and cd-eject; it
also modifies xl to handle cdrom eject requests coming from guests
(actually coming from qemu).
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 8 Dec 2009 07:45:15 +0000 (07:45 +0000)]
fs-backend: add a backend cleanup function
This patch implements a backend cleanup function in fs-backend so that
when the connection to the frontend is closed we don't leak nodes on
xenstore.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 8 Dec 2009 07:44:45 +0000 (07:44 +0000)]
libxenlight: minimal vfs support
This patch adds minimal support for fs-backend and minios' fs-front
to libxenlight:
- it creates a vfs directory on the stubdom's xenstore
device path and allows the stubdom to write to it;
- it doesn't try to cleany shutdown the vfs backend.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Mon, 7 Dec 2009 14:10:27 +0000 (14:10 +0000)]
Keir Fraser [Sat, 5 Dec 2009 12:32:34 +0000 (12:32 +0000)]
x86_32: Fix build after 20575:
0930d17589a6
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Sat, 5 Dec 2009 12:30:46 +0000 (12:30 +0000)]
libxenlight: physmap slack for pv domains
Contemplate a memory space slack for PV domains,
since they do ballooning (or flipping network rx)
and need some extra room in their pfn space.
Note that this does not allocate any extra memory
to the domain, it simply extends the physmap with
some extra room for "bounce bufffering back" pfn's
that are yielded to dom0.
The default slack is set at 8MB.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Acked-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Sat, 5 Dec 2009 12:29:48 +0000 (12:29 +0000)]
Keir Fraser [Fri, 4 Dec 2009 07:11:44 +0000 (07:11 +0000)]
libxenlight: get state for one domain
Simple function to get the dominfo state of a single domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:11:06 +0000 (07:11 +0000)]
libxenlight: domain resume
Added libxenlight implementation for resume domain.
This brings back a cooperative pv domain from the
shutdown state after save, enabling checkpointing.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:10:22 +0000 (07:10 +0000)]
libxenlight: Destroy device model only for domains that have it
Destroy device model only for domains that have it.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:09:44 +0000 (07:09 +0000)]
libxenlight: avoid writing empty values to xenstore
Prevent segmentation fault caused by empty values
in key-value pairs for the /vm/ subdirectory
when restoring a pv domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:06:47 +0000 (07:06 +0000)]
libxenlight: disk and nic destroy calls
Expose disk and nic device destroy calls
Also removes the obsolete device shutdown calls.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:03:45 +0000 (07:03 +0000)]
libxenlight: refactor libxl destroy code
Refactor libxl device destroy code. Abstract function
waiting for the watch on the state node to fire.
Create a generic device delete function.
Only a single LIBXL_DESTROY_TIMEOUT elapses when
waiting for destruction of all the devices of a
domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:02:49 +0000 (07:02 +0000)]
libxenlight: fix GC when cloning contexts
Provide a function to clone a context. This is necessary
because simply copying the structs will eventually
corrup the GC: maxsize is updated in the cloned context
but not in the originating, yet they have the same array
of referenced pointers alloc_ptrs.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:00:25 +0000 (07:00 +0000)]
xend: Fix parameters to PyArg_ParseTupleAndKeywords()
The kwd_list parameter PyArg_ParseTupleAndKeywords() must be a
NULL-terminated list.
Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
Keir Fraser [Fri, 4 Dec 2009 06:59:33 +0000 (06:59 +0000)]
x86: XENMEM_add_to_physmap should propagate errors from guest_physmap_add_page().
Authored-by: David Lively
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Keir Fraser [Fri, 4 Dec 2009 06:58:08 +0000 (06:58 +0000)]
Add keyhandler 'g' to print all active grant table entries.
Authored-By: Robert Phillips
Signed-off-By: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Keir Fraser [Fri, 4 Dec 2009 06:51:53 +0000 (06:51 +0000)]
libxenlight: Get rid of the dependency on the LIBCONFIG_SOURCE directory.
Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com>
Keir Fraser [Fri, 4 Dec 2009 06:50:46 +0000 (06:50 +0000)]
libxenlight: Delete dep files on 'make clean', and include them in Makefile rules.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 3 Dec 2009 13:52:02 +0000 (13:52 +0000)]
grant-tables: do not fail attempts to GNTTABOP_set_version to the current version.
...even if there are active grants.
This triggers when checkpoint a guest which essentially resumes
without actually having gone through the suspend so the domain is
already latched to v2 inside Xen.
Also return the current actual version on success and failure. Not
terribly useful with only 2 options but is more robust to future
developments.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Thu, 3 Dec 2009 13:51:20 +0000 (13:51 +0000)]
xend: Add GPL license stanza to MemoryPool.py
Signed-off-by: James Song (Wei) <jsong@novell.com>
Keir Fraser [Thu, 3 Dec 2009 13:50:43 +0000 (13:50 +0000)]
Remus: fall back to xenstore if necessary
This is primarily for pvops until it gets a dedicated suspend
event channel.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Thu, 3 Dec 2009 13:50:14 +0000 (13:50 +0000)]
Remus: fix shadow memory allocation, broken by 20558:
4ed3b9b1de3f
This approach is perhaps a little cleaner than directly calling
balloon.free.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Wed, 2 Dec 2009 18:46:14 +0000 (18:46 +0000)]
x86 hvm: fix up the unified HAP nested-pagefault handler.
A guest PFN may have been marked dirty and switched to p2m_ram_rw by
another CPU between the VMEXIT and lookup in this handler, so
we can't just check for p2m_ram_logdirty. Also, handle_mmio
doesn't handle passthrough MMIO.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Wed, 2 Dec 2009 18:43:28 +0000 (18:43 +0000)]
xentop: Allow full domain name display
Add a '-f' option to xentop to allow the full domain name to be
displayed. This is the original behavior which can cause the display
to be unaligned. Customers have requested this because only the
trailing characters of their domain names are unique and therefore
cannot be distinguished when the display is limited to a 10 character
width.
Signed-off-by: Charles Arnold <carnold@novell.com>
Keir Fraser [Wed, 2 Dec 2009 18:42:36 +0000 (18:42 +0000)]
libxenlight: fix multiple xenstore watches problem
this patch fixes the multiple xenstore watches problem in libxenlight
opening a new xenstore connection to set and read temporary watches on
the device state nodes. This way they don't interfere with other long
running watches.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 18:42:03 +0000 (18:42 +0000)]
libxenlight: use watch and select in libxl_wait_for_device_model
This patch reimplements libxl_wait_for_device_model using a xenstore
watch and a select loop.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 18:41:31 +0000 (18:41 +0000)]
libxenlight: fix dm_xenstore_record_pid
The function dm_xenstore_record_pid is executed by a child of the main
process and therefore shouldn't use the same xenstore connection:
currently it opens a new connection but still uses the old one.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 13:45:35 +0000 (13:45 +0000)]
xenstat: Fixes for 20528:
e6e3bf767d16 (stats for dom0 network bonding)
In above c/s I introduced dom0 statistics for case we use network
bonding. The indentation was not good for xenstat C codebase and also
some modifications were done to the logic, mainly not using the parsed
variables we don't care about (as we care only about
{tx|rx}{bytes,packets,errs,drops} and no other variable from
/proc/net/dev) by passing NULLs to variables we don't care about. Also
dom0 statistics alteration was fixed to include {tx|rx}{drop,errs} for
dom0 (previous version of my patch was not having this code applied).
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Keir Fraser [Wed, 2 Dec 2009 13:43:37 +0000 (13:43 +0000)]
xend, vt-d: do not reserve vtd_mem if iommu is not enabled
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Wed, 2 Dec 2009 13:39:07 +0000 (13:39 +0000)]
vmx: During task-switch, read instr-len VMCS field only when valid.
Otherwise we can crash on the BUG_ON() in __get_instruction_length().
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>